-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Python: Improve performance of FileNotClosed query by using basic block reachability #19641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Python: Improve performance of FileNotClosed query by using basic block reachability #19641
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the performance of the FileNotClosed query by replacing full CFG reachability checks with basic-block–level reachability approximations and adds a test to illustrate the approximation’s limitations.
- Introduce
bbSuccessor
,bbReachableStrict
, andbbReachableRefl
predicates for faster reachability. - Update
guardsExceptions
inFileClose
(and itsWithStatement
override) to use these new predicates. - Add a new
closed29
test demonstrating a spurious false positive due to the approximation.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
python/ql/test/query-tests/Resources/FileNotAlwaysClosed/resources_test.py | Add closed29 test case that tags a spurious false positive with $SPURIOUS . |
python/ql/src/Resources/FileNotAlwaysClosedQuery.qll | Replace full CFG reachability with block-level reachability and refine exception guards. |
Comments suppressed due to low confidence (1)
python/ql/test/query-tests/Resources/FileNotAlwaysClosed/resources_test.py:285
- [nitpick] In
closed29
, the variablef28
duplicates the name fromclosed28
. Consider renaming it tof29
(or a more descriptive name) to avoid confusion.
f28 = open(path) # $SPURIOUS:notClosedOnException
Co-authored-by: Copilot <[email protected]>
private predicate cfgGetASuccessorPlus(ControlFlowNode src, ControlFlowNode sink) = | ||
fastTC(cfgGetASuccessor/2)(src, sink) | ||
private predicate bbReachableStrict(BasicBlock src, BasicBlock sink) = | ||
fastTC(bbSuccessor/2)(src, sink) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should be able to use doubleBoundedFastTC
here, for even better performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That requires extra predicates to restrict the source and sink, right?
I'm not sure what those predicates should be.
bbReachableRefl(raises.asCfgNode().getBasicBlock().getAnExceptionalSuccessor(), | ||
this.asCfgNode().getBasicBlock()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case they belong to the same basic block, there is no logic guaranteeing that raises
comes before this
in that basic block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raises
must come before this
regardless because we're checking the that exceptional successor of raises
reaches this
on the basic block level.
Taking a step back and considering semantics: This seems like a clear-cut use-case for post-dominance. I.e. a file is always closed if the open is post-dominated by a close. So we're looking for opens that can reach a close, but aren't post-dominated. Is there any reason not to simplify the query implementation to that? |
Possibly would make sense to use dominance, just wasn't sure exactly how it worked. I suppose it would also behave differently with conditionals; i.e. if use_stdin:
file = sys.stdin
else:
file = open(path)
try:
line = file.readline()
finally:
if not use_stdin:
file.close() currently we wouldnt alert there, filtering some potential FPs in which the conditional close is in fact correct, at the cost of maybe some FNs in more complex cases. Would a dominance based version alert there by default? |
First off, yes, dominance would handle exception edges. Second, you're right that we need to handle conditionals to avoid an FP in the example you give. Consider the set of basic blocks that may lie on a path between open and close. What we're looking for is one such block, A, with a successor, B, that steps out of this set. If this was all we needed then post-dominance would give us the answer, but we need to allow certain A->B steps as ok, e.g. the false-successor in The ambitious approach would involve actually analysing the branch edge A->B to see if we could prove it irrelevant given the context of the file-open, but that's probably best delegated to a proper shared control-flow reachability library. So let's just use the heuristic that you already outlined, i.e. conditional branching is probably fine, whereas we're looking for exception edges as those that might indicate bugs. To put this together we can define a
Finally we can check whether the open will reach a close: |
cc @nickrolfe , @aschackmull , @alexet